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\ The new Belle <j)3'/l measurement arXiv : hep-ex/0604054, based on Dalitz analysis 

^ ■ of D ^ K?^7r+7r^ in B='= -^ d(*)K(*)^ decays, uses likelihood ratio ordering to 

' set confidence intervals in 03 and the r, 5 parameters. This is different to the 
choice made by BaBar in PRL 95, f2f802 (2005) and arXiv:hep-ex/0507101, 

^ \ and requires additional computation. This Note explains Belle's choice using a 

•/^ ■ related but simpler example: the averaging of two numbers. We find that intervals 

1^ \ calculated with likelihood ratio ordering reproduce the analytic solution to this 

~l ' problem, whereas intervals calculated by ordering according to the p.d.f. (so-called 

f^ \ Neyman intervals) do not, and show a pathology which is important in our case. 
\^ • This document is adapted from a Belle Internal Note. 
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Neyman & Feldman-Cousins intervals for a simple problem 
with an unphysical region, and an analytic solution 

Introduction 

A known pathology of frequentist methods is that they can give empty confidence in- 
tervals in some cases; more generally, they have trouble handling measurements with 
unphysical regions, such as ^ A? + S*^ > 1 in the analysis of time-dependent B'' -^ vr+vr^ 
decays. The so-called Feldman-Cousins approach was introduced (in part) to solve this 
problem from first principles. 

The following is an illustration of how this works, using a simple example of a 
measurement with an unphysical region. The example is inspired by the problem of 
extracting the parameters ((/>3, r, 5) from measurements (x+, ?/+, x_, y_) in the B — > 
I3(*)K(*)='= Dalitz analysis, and was chosen such that an "obvious" right answer already 
exists, in analytic form. 

The problem 

Consider a continuous quantity /i, of which we make two independent measurements a 
and h. Suppose that each measurement is unbiased and distributed as a Gaussian with 
known standard deviation o" = 1, so that the probability density of the pair (a, 6) is 

/(a, 6; /i) = Q{a; /i, cr = 1) • Q^h; f^, cr = 1), (1) 

where G{x; m, s) = ]— exp ( ~^^~^' \ , Given a measurement {a,b) = {aQ,bo), what 



/2^ ^^^^ V 2. 

is the confidence interval in fi for a given confidence level? 

The obvious solution to this is that the best estimate of fi, and the corresponding 

confidence intervals, are given by the simple mean and the standard error of the mean, 

^cst = ^('^0 + bo), aM = -7= ■■ (2) 

• the 68.3% (Icr) interval should be [nest — cm, fJ'est + (^m], 

• the 95.4% (2(t) interval should be [fiest — '^o'm, fJ-est + 2(TAf], 

and so on. Supposing that we're perverse enough to throw the full frequentist machinery 
at the problem, we want to see if we can recover this solution. 

In the next section we briefiy revisit the frequentist construction of confidence in- 
tervals: if you're already familiar with this (or if such details bore you), please skip 
to page 121 where we treat the default or "Neyman" implementation, followed by Feld- 
man and Cousins' likelihood ratio ordering on page El A summary comparison, and the 
application to the B^ -^ ]3(*)k(*)='= Dalitz analysis, can be found on page [71 



Frequentist confidence intervals (for revision: skip this if desired) 

If f{x; 6) is the p.d.f. for a measurement x given a parameter 9, then for a given confi- 
dence level (1 — a) we seek values xi, X2 such that 

P{xi <x<X2;9) = l-a= / drc/(x; 6), (3) 



i.e. we want a (small) probability a for the measured value x to lie outside the interval 
[a;i,a;2]. In principle this should be done for all possible 9, defining functions xi{9), 
X2{9). The result is a belt D{a) in (x, 0), as shown in figure^ For a given measurement 
xq one draws a vertical line x = xq: its intersection with D{a) is the confidence interval 
in 9, [92{xq),9i{x{))]. The method is general and can be applied to multidimensional 
parameters 6 and data x: the integral in Eq. Q is then performed over a region in x. 

If the experiment is repeated n times, 
the measurements xq and confidence in- 
tervals [92{xq),9i{xq)\ will vary, but the 
true value 9 should lie inside the interval 
in a fraction (1 — a) of cases. The ideal, 
where this holds for any 9 (in the limit 
where n — > oo), is called coverage: 

I - a = P[92{x; a) < 9 < 9i{x; a)] (4) 

For continuous / this follows from eq. (jSJ; 
for discrete /, or for cases where an ap- 
proximate method is used, eq. (0} will not 
hold in general. If the fraction is smaller 
than (1 — q) for some value 9 then the 
interval-setting method is said to under- 
cover for that value, which is bad; if larger 
than (1 — a), then it overcovers, which is 
conservative (implying some correspond- 
ing loss of power for the method) . 
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Possible experimental values x 

Figure 1: Construction of confidence inter- 
vals by forming the confidence belt D(a). 
From the PDG2004 review [Fig. 32.3]. This 
plot assumes monotonic xi_2(^)j defining 
corresponding functions ^1,2(3^)- 
Equation ^ is not sufficient to define D{a), since in general, given 9 there is more 
than one choice for {xi,X2)'. infinitely many, if / is continuous. Typically some algorithm 
is used to determine xi^2(^)5 and hence the confidence interval [92{xq),9i{xo)], for any 
measurement x = xq. For the case shown in figure ^ the special choice 

X2 — > +00 always produces an upper limit, a special interval with only an upper edge 
in 9, since for any measurement xq the interval will include values 9 ^ —00, or to 
whatever the minimum defined value might be: for example, the "90% C.L. upper 
limit S90" on a branching fraction B is the confidence interval B G [0,i39o]; 

xi — > — 00 always produces a lower limit, an interval with only a lower edge in 9, which 
will include 9 — > +C)0 (or 1, or whatever). 

In more general cases, confidence intervals need not be simply connected {i.e. they can 
have gaps, contrary to the assumption used to draw figure P); and in pathological cases 
they can be empty (because x = xq never intersects D{a)). As we'll see, empty intervals 
can occur even for the very simple case considered in this note. 



Intervals with "Neyman" ordering 

So, for any given parameter 9 we must integrate the p.d.f. /(x; 9) for the measurement x 
until we have an area equal to the confidence level (1 — a) (see eq. (ini). A straightforward 
way to do this is to set the probabilities at the boundaries xi and X2 to be the same, 
P{xi) = P[x2). For a symmetric function /(x; 9) this is gives a central interval 



P{x < xi] 9) = dxf{x; 9) = — = / dx/(x; 9) = P{x2 < x; 

J —OO ^ J X2 



(5) 



with equal area in each tail. For more general functions / or for functions in n > 1 
dimensions, we choose a domain in x beginning with the points of highest probability, 
and then including lower-probability points in turn, until the desired area is achieved. 



dx/(x; e) = l-a. 



(6) 



/>/min(«) 



This simplest kind of ordering of the points included in the integral is called by many 
people Neyman ordering, although as far as I know Neyman is responsible for the general 
method described in the previous section, not for the choice ©• 

Applying this to our /(a, h; fi) of eq. (^, we have the integral of a 2D Gaussian, 
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da db—— exp 
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[(a-^)2 + (6-^)2] 



f>fmin{oi) 



(7) 
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Figure 2: Construction of confidence intervals using "Neyman" ordering. 



which we solve as usual substituting (a — /u) = pcosip, {b — fi) = psimp: 

l-a= // dpd?/; |J(p,-0)| — exp f -^ j = / dV^ / dpp—exp' 



/>/min(Q) P<Pmax(Q) 

■ exp 



.(") 



/Omax(a) = V-21na. (8) 

For any true value 6 = p, the confidence belt Z)(a) includes all points in the measurement 
space (a, 6) within a circle of radius pmax around {p,p), 

a ID equivalent Pmax /'max ^ (^^^ below) 



0.3173 "la" 1.515 2.296 1.138 

0.1 90% CL 2.146 4.605 1.384 

0.05 95% CL 2.448 5.991 1.466 

0.0455 "2(t" 2.486 6.180 1.476 

0.0027 "3o-" 3.439 11.829 1.682 

where the p^ax correspond to the Ax^ values for iwo-dimensional confidence intervals. 
For what we think of as an na interval, pmax > na as shown in the lower-left of Figure [21 

A measured point (ao,6o) will be inside the confidence belt D{a) for values 9 = p 
where the distance y^(ao — p)'^ + {bo — pY < Pmax- We can therefore find the confidence 
interval in p by taking the intersection of the a = b line with a disk of radius pmax drawn 
around the measurement (oo,6o)- Fig. |21 showns an example where the measurement is 
3cr away from the physical line: the 68.3% and 95.4% disks do not intersect a = 6, so 
the corresponding confidence intervals are empty. This reflects the low probability for 
the measurement to be this far away from the a = b line. 

As for the 99.7% interval, our analytic solution (Eq. ©), tells us that it should be 
centered at pest = ^(oo + ^o)) with a half-width 3aM = ^cr = -^- The corresponding 
a = b line segment runs from {p^st - 3/\/2, /Xcst - 3/\/2) to (/Ucst + 3/\/2, pest + 3/\/2): 
2x3 = 6 units long. The Fig. [21 construction gives the correct central point {pest,Pest), 
but a half-width on the plane of \/Pmax ~ ^^ = 1.682, shorter than the correct value of 
3. (The half-width in p is smaller by a factor of v2, which can be [to me] confusing.) 

The use of so-called Neyman ordering can thus give us empty confidence intervals, or 
intervals that are too narrow, if the measured values ag and 6o are far apart. If, on the 
other hand, the measured values are close to each other, the interval can be too wide: for 
an ideal measurement ao = feo = fJ-, the intersection on Figure HI would be 2/>max = 6.878 
units long for a 99.7% confidence interval, to be compared with the correct value of 6.0. 
An na interval (with CL. a„) will have the correct width for data la away from the 
physical line a = b, where / = y^— 21n(ari,) — n"^'- this gives a distance I = 1.68a for a 3a 
interval. Other values are included in the table above. 

On average, the true value will in fact lie within the confidence interval 99.7%, 95.4%, 
68.3% (or whatever) of the time, as it must by construction. But in some cases, where 
the interval is empty, we know that the true value is not inside the interval, and so the 
exercise has turned out to be useless for our purposes. Therefore, while the construction 
is "correct" in a technical sense, it is clearly pathological. 



Intervals with likelihood-ratio ordering (a.k.a. "Feldman-Cousins" ) 



A different way of choosing the integration domain in x (x = (a, 6), in our example) 
was advocated in a paper by Feldman and Cousins. One can actually find the principle 
written down in an (old) standard reference, but until recently it does not seem to have 
been implemented. The argument is as follows: 

Given a parameter 6 for any point x we have a decision to make: does this point 
belong in the confidence belt, or not? The appropriate way to make such a decision is 
based not on a likelihood but on a likelihood ratio^ 



X 



/(x; e) 



/(x; 0bcst(x)) 



(9) 



where we compare the likelihood of the parameter 9 given data x, to the likelihood of 
the best possible parameter for those data, 0best(x).^ If /(x; 0) is small compared to 
/(x; 0best)) then the parameter 9 is relatively unlikely, given the data. However if both 
/(x; 6) and /(x; 0best) are small, and their ratio A ~ 1, then the low likelihood for 6 
does not matter: even the best parameter value ^best is unlikely, so there is no reason 
to exclude this point from the confidence belt. 

Using likelihood ratio ordering we choose a domain in x beginning with the points of 
highest likelihood ratio A from Eq. Q, and then including lower- A points in turn, until 
the desired area is achieved, 



dx/(x; e) = l-a. 



(10) 



A>Amin(a) 



In general, equation Q requires a minimization step for each x, and thus more compu- 
tation than ordering by probability a la "Neyman" . A strategy to minimise computation 
may be necessary to make the method tractable, although the difficulty should not be 
exaggerated: it has been done at Belle in a number of seemingly difficult cases. 
Our problem is simple enough to handle as a high-school exercise: from Eq. (^, 



/(a, b; t^) = ^ exp (- [(a - (if + (b - fif] /2) , 
1 



2;^^"P 



2^'''^ 



({n + v}/V2-f?j + (^{u - v}/V2 - f?j ' 



/2 



/'(-u, v; [i) = g{u; V2 ■ fi, a = I) ■ g{v; 0, a = I) 



(11) 



where we use rotated coordinates 



u 



i=(a + 6), . = i=(a 



b). (12) 

In evaluating Eq. ©, we compare f'{u,v) with the best-fitting value at that point, 

^Note the distinction: The likelihood of the parameter given the data is numerically equal to the 
probability of the data, given the appropriate value of the true parameter. In frequentist statistics the 
true parameter "is what it is" and it is not sensible to assign a probability to it. 



fhest{u,v) = Q{u; \/2 • ^bcst, 1) • Q{v; 0, 1) 
= g{u;u, l)-G{v; 0, 1) 
= l-e(^;0, 1), 

since the best-fitting estimate of tlie parameter at (u, v) = (a, h) is 

/^bcst = /^est = ^[a + b) = -j=u-, 

where the Gaussian reaches its peak. The hkelihood ratio is thus 

g{u- V2-fi, l)-g(t;;0, 1) 



A 



l-g(^;0, 1) 



g{u; V2-fi,l) 



(13) 



(14) 



(15) 



and the orthogonal coordinate v drops out of the problem. For any true parameter 
6 = fj,, then, the confidence belt is given by the one- dimensional integral 



u=V2(fM+S) 



/ du Q{u; V2 ■ n, 1) = 1 — a; 



(16) 



u=V2{fi-5) 

following the same procedure as before (see Fig. IS} we find an interval /i G 



fJ'est ^, fJ-est + ^ 



at the 68.3% CL, and /j, G /iest -75 
this is the result we wanted. 



) ^est + 
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at 95.4%, and so on, whatever the data: 
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Figure 3: Construction of confidence intervals using likelihood ratio ordering. 



What just happened? or, Why do the intervals differ? 

Comparing the "Feldman-Cousins" procedure (figure|SJ to the "Neyman" case (figureEJ, 
we see that the hkehhood ratio ordering keeps points close to the best estimate of ^, 
Mbest = |(oo + ^o) = ^^Oi along the u-axis, even if they are far away from the expected 
value of zero in the orthogonal coordinate v = —^{a — b) ... since the difference (oq — 60) 
is irrelevant to the estimation of the underlying parameter ^. Points with large \v\ are 
indeed improbable, but this does not help us discriminate between different fi values: 
only the relative likelihood along the n-axis does that. By ignoring this distinction, the 
apparently natural procedure of ordering- by-/ (a, b; n) produces intervals that vary with 
\v\, disappearing when it is large. At the price of computation (in the general case), 
ordering by the likelihood ratio in Eq. Q automatically makes the distinction between 
relevant (u) and irrelevant (v) information. From equations (jllj) . H13|) . and ()15() . it's 
clear that this will also work when making 3, 4, ... n measurements according to a single 
mean fi, and it should apply in general when the dimension of the data x = {xj} exceeds 
the dimension of the parameters = {Oj}.'^ 

Put another way, given that the problem is overconstrained — one parameter // for two 
measurements a and b — we can consider goodness-of-fit as well as parameter estimation, 
and ask two separate questions: 

1. Given our hypothesis, what can we say about the parameter /x? 

2. Is our hypothesis, of independent Gaussian measurements ^(o; fi, l)and^(6; jj., 1), 
consistent with the data? 

Likelihood ratio ordering concentrates ruthlessly on question 1. Question 2 is left for us 
to consider ourselves, as a matter of due diligence. Even if there is some suggestion of a 
poor fit to the data (|f | ^ 1), this is not a reason to bollocks up the confidence intervals 
in /x, since question 1 is valid on its own terms. ^ 

AppHcation to the B=^ -^ Y^(*)}^i*)± DaHtz analysis 

This is relatively straightforward. In Anton's analysis we are interested in finding 
{(p^, r, 6) — especially ips, of course — but in order to have well-behaved measurements'^ 
we choose to measure (x+, y+) from the B"*" sample and (x_, y_) from the B~ sample, 
where x± = r± cos(±(/)3 + S) and y± = r± sin(ib(/)3 + 5). 

This is one measurement too many. If we had infinite precision we would always 
find r_|_ = r_: in practice we always get r^ ^ r^, and a result that is "unphysical" in 



"^Strictly speaking, it's only clear that the logic used here will apply for cases where the likelihood 
can be factorised into functions of the "relevant" and the "irrelevant" coordinates. I'm not sure whether 
a more fundamental argument can be made. For the special case where the true parameter 6 can only 
take on two discrete values, the likelihood ratio condition Eq. @ gives the best possible discrimination 
between them, according to the Neyman-Pearson theorem. For more general cases, the likelihood ratio 
test is at least considered a good criterion to try by default. (See Eadie et al. from the Extra Reading.) 

^This distinction is discussed in section IV. C of the Feldman-Cousins paper: "An advantage of our 
intervals [is that they] effectively decouple the confidence level used for a goodness-of-fit test from the 
confidence level used for confidence interval construction ..." 

■^For practical reasons we want quantities whose PDF is approximately a Gaussian, with small (if 
any) bias. The cartesian coordinates meet these criteria, whereas the polar coordinates (esp. r) do not. 



that sense. ^ To get the example in this note, take fj. = r, a = r^, b = r^, forget about 
r needing to be positive, and drop the other quantities. That's it. 

I made a brief attempt to extend the analytic study to the full problem, to understand 
the effect on Neyman intervals in (j)^ and 6 as one moves towards or away from the 
physical region . . . and only managed to give myself a headache. Clearly the intervals 
will be affected, and presumably in the same sense as those for r (becoming narrower 
or wider as appropriate). But the full problem is substantially more complicated than 
averaging two numbers. 

As to the actual results: as you know, Anton uses three separate decay modes, so 
the measurement is done three times over, with (p^ in common. The B^ — > DK^ result 
is close to the physical case, whereas the B — > D*K and DK* results are not. 

Conclusion 

We already know how to average two numbers a and b. Since this problem is trivial, 
but also resembles estimating {(j)^, r, 6) from measurements (x+, y+, x_, y_) in one 
important aspect, it provides a suitable test-case for our statistical method. 

We find that likelihood-ratio ordering gives the correct confidence intervals for the 
(unknown true) value fi which the average is used to estimate. It does this by building 
a confidence belt along the A=[a -\-b) axis, which gives information about /x, and auto- 
matically ignoring the orthogonal axis which only has information on goodness-of-fit. 

If instead we order according to the p.d.f. ("Neyman" ordering) both axes are treated 
equally. If the measurement is close to the "physical" case a = b, the resulting confidence 
intervals are too wide; if the measurement is far away, |a — 6| ^ cr, the intervals are too 
narrow, and can even become empty, which is pathological. 

Further reading 

• A primer on confidence intervals, and statistics generally: 

Section 32 of the Review of Particle Physics, S. Eidelman et al. (PDG), Phys. Lett. 
B 592, 1 (2004). Notation on page [21 is chosen to match that in the reference. 

• [some Belle-internal things went here] 

• Intervals based on likelihood ratio ordering: 

Gary J. Feldman, Robert D. Cousins, "Unified approach to the classical statistical 
analysis of small signals", Phys. Rev. D 57, 3873 (1998). 

• Hypothesis testing, the Neyman-Pearson test, and related issues: 

Chapter 10 of "Statistical methods in experimental physics", W.T. Eadie, D. Dri- 
jard, F.E. James, M. Roos, B. Sadoulet (Amsterdam: North-Holland, 1971). There 
are doubtless better references, but this book is one of the standard ones used by 
particle physicists. 



^ Cf. the (j)2 analysis using B° — > tt+tt , where the dimensionahty of the parameters and the data was 
the same, and we had a physical region ^/ A^ + 5*^ < 1. 



About this note 

This note arose out of an intermittent discussion with Tim Gershon and others during 
Belle-internal review of Anton Poluektov's B -^ Y)i.*)]^(*)^ Dalitz analysis. An early 
version was shown during a refereeing meeting in July 2005. 

The fact that an unphysical region results when n measurements are used to estimate 
m parameters, m < n, and the problem this causes for Neyman intervals, was noticed 
and urged by Anton. It seems a lot more obvious in retrospect than it did at the time. 

The averaging example presented here is original, as far as I know. 



